Predicting L2 Misses to Increase Issue-Queue Efficacy

نویسندگان

  • Enric Morancho
  • José María Llabería
  • Àngel Olivé
چکیده

The issue queue keeps the instructions that are waiting for the availability of input operands and issue slots. While some instructions remain for a few cycles in the issue queue, the instructions dependent on L2 misses may remain there for hundreds of cycles due to the L2 miss latency. Some authors have proposed mechanisms to extract these instructions from the issue queue. However, these mechanisms increase the issue-queue activity because the extracted instructions must be replayed, that is, issued twice (at least). Firstly, to be extracted from the issue queue. Secondly, after resolving the L2 miss, to be executed. We propose delaying the insertion of some instructions in the issue queue. After predicting which load instructions are going to miss in L2, the instructions dependent on these load instructions will be stored in an instruction buffer instead of being inserted in the issue queue. After resolving the miss, the instruction buffer will be traversed in order to insert in the issue queue the instructions dependent on the resolved memory access. The advantages of this proposal with respect to proposals that extract from the issue queue the instructions dependent on L2 misses are twofold. First, it avoids filling the issue queue with instructions dependent on L2 misses. Second, it reduces the amount of instruction replays. The evaluations show that delaying the insertion of instructions in the issue queue reduces the amount of instruction replays between 27% and 31% in integer benchmarks and between 33% and 39% in floating-point benchmarks with respect to processors that extract from the issue queue the instructions dependent on L2 misses. The evaluations also show that this replay reduction does not harm processor performance. Index Terms — Instruction issue, L2 hit/miss prediction, Instruction replays.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Top 64 PCs by L1 Misses, Input Set B.xls

PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs PC L1 Misses L2 Misses Number of EAs 1 463592

متن کامل

Top 64 EAs by L1 Misses, Input Set B.xls

EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs EA L1 Misses L2 Misses Number of PCs 1 268855128 527448 9598 11925 268855128 522745 5 11721 268855128 523571

متن کامل

Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors

It is critical to provide high performance for scientific programs running on a Chip MultiProcessor (CMP). A CMP architecture often has a shared L2 cache and lower storage hierarchy. The shared L2 cache can reduce the number of cache misses if the data are commonly shared by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on...

متن کامل

Characterization of Context Switch Effects on L2 Cache

EKER, ABDULAZIZ. Characterization of Context Switch Effects on L2 Cache. (Under the direction of Dr. Yan Solihin.) Multitasking is common in most systems. In order to use the processor resources efficiently, a multitasking system schedules processes to run for certain intervals by switching (saving and restoring) their contexts. However, since processes bring their own data to the cache when th...

متن کامل

Performance Characterization of the FreeBSD Network Stack

This paper analyzes the behavior of high-performance web servers along three axes: packet rate, number of connections, and communication latency. Modern, high-performance servers spend a significant fraction of time executing the network stack of the operating system—over 80% of the time for a web server. These servers must handle increasing packet rates, increasing numbers of connections, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006